Restricted Decontamination for the Imbalanced Training Sample Problem
نویسندگان
چکیده
The problem of imbalanced training data in supervised methods is currently receiving growing attention. Imbalanced data means that one class is much more represented than the others in the training sample. It has been observed that this situation, which arises in several practical domains, may produce an important deterioration of the classification accuracy, in particular with patterns belonging to the less represented classes. In the present paper, we report experimental results that point at the convenience of correctly downsizing the majority class while simultaneously increasing the size of the minority one in order to balance both classes. This is obtained by applying a modification of the previously proposed Decontamination methodology. Combination of this proposal with the employment of a weighted distance function is also explored.
منابع مشابه
The Imbalanced Training Sample Problem: Under or over Sampling?
The problem of imbalanced training sets in supervised pattern recognition methods is receiving growing attention. Imbalanced training sample means that one class is represented by a large number of examples while the other is represented by only a few. It has been observed that this situation, which arises in several practical domains, may produce an important deterioration of the classificatio...
متن کاملLearning From Imbalanced Data: Rank Metrics and Extra Tasks
Imbalanced data creates two problems for machine learning. First, even if the training set is large, the sample size of smaller classes may be small. Learning accurate models from small samples is hard. Multitask learning is one way to learn more accurate models from small samples that is particularly well suited to imbalanced data. A second problem when learning from imbalanced ata is that the...
متن کاملImproving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering
Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...
متن کاملSample Subset Optimization for Classifying Imbalanced Biological Data
Data in many biological problems are often compounded by imbalanced class distribution. That is, the positive examples may largely outnumbered by the negative examples. Many classification algorithms such as support vector machine (SVM) are sensitive to data with imbalanced class distribution, and result in a suboptimal classification. It is desirable to compensate the imbalance effect in model...
متن کاملبررسی تاثیر عصاره آبی گیاه حساس در جداسازی آلودگی انگلی از سبزی جعفری مصرفی در شهرستان ساری
Background and purpose:Parasitical diseases are a common problem in developing countries. Different shapes of parasite such as cyst, larva and egg are carried from vegetables. Leaves of Allbezia contain saponin constituents, which is effective on surface tension reduction, and is preferred to chemical washer.This research was designed to study the extracted effect of removal from vegetable and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003